Decoding Word Embeddings with Brain-Based Semantic Features

نویسندگان

چکیده

Word embeddings are vectorial semantic representations built with either counting or predicting techniques aimed at capturing shades of meaning from word co-occurrences. Since their introduction, these have been criticized for lacking interpretable dimensions. This property limits our understanding the features they actually encode. Moreover, it contributes to “black box” nature tasks in which used, since reasons embedding performance often remain opaque humans. In this contribution, we explore properties encoded by mapping them onto vectors, consisting explicit and neurobiologically motivated (Binder et al. 2016). Our exploration takes into account different types embeddings, including factorized count vectors predict models (Skip-Gram, GloVe, etc.), as well most recent contextualized (i.e., ELMo BERT). analysis, first evaluate quality a retrieval task, then shed light on that better each type. A large number probing is finally set assess how original mapped perform discriminating categories. For identify relevant show there correlation between encode those features. study sets itself step forward aspects captured vector spaces, proposing new simple method carve human-interpretable distributional vectors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing Sensitivity Classification with Semantic Features Using Word Embeddings

Government documents must be reviewed to identify any sensitive information they may contain, before they can be released to the public. However, traditional paper-based sensitivity review processes are not practical for reviewing born-digital documents. Therefore, there is a timely need for automatic sensitivity classification techniques, to assist the digital sensitivity review process. Howev...

متن کامل

Text Segmentation based on Semantic Word Embeddings

We explore the use of semantic word embeddings [14, 16, 12] in text segmentation algorithms, including the C99 segmentation algorithm [3, 4] and new algorithms inspired by the distributed word vector representation. By developing a general framework for discussing a class of segmentation objectives, we study the effectiveness of greedy versus exact optimization approaches and suggest a new iter...

متن کامل

AutoExtend: Combining Word Embeddings with Semantic Resources

We present AutoExtend, a system that combines word embeddings with semantic resources by learning embeddings for non-word objects like synsets and entities and learning word embeddings which incorporate the semantic information from the resource. The method is based on encoding and decoding the word embeddings and is flexible in that it can take any word embeddings as input and does not need an...

متن کامل

Adjusting Word Embeddings with Semantic Intensity Orders

Semantic lexicons such as WordNet and PPDB have been used to improve the vector-based semantic representations of words by adjusting the word vectors. However, such lexicons lack semantic intensity information, inhibiting adjustment of vector spaces to better represent semantic intensity scales. In this work, we adjust word vectors using the semantic intensity information in addition to synonym...

متن کامل

Exploring Semantic Representation in Brain Activity Using Word Embeddings

In this paper, we utilize distributed word representations (i.e., word embeddings) to analyse the representation of semantics in brain activity. The brain activity data were recorded using functional magnetic resonance imaging (fMRI) when subjects were viewing words. First, we analysed the functional selectivity of different cortex areas by calculating the correlations between neural responses ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computational Linguistics

سال: 2021

ISSN: ['1530-9312', '0891-2017']

DOI: https://doi.org/10.1162/coli_a_00412